Optimum bias for fast-switching free energy calculations 
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We derive the bias function that minimizes the statistical error of free energy differences calculated 
in work-biased fast-switching simulations. The optimum bias function is compared to other bias 
functions using a particle pulled through a viscous fluid as an illustrative example. Our analysis 
indicates that the uncertainty in the free energy is smallest if both dominant and typical work values 
are sampled with high frequency. 



I. INTRODUCTION 

Fast switching computer simulations based on Jarzyn- 
ski's non-equilibrium work theorem offer an interesting 
way for the computation of free energies [J 0, 0] . In this 
approach, which is particularly relevant in the context of 
recent mechanical single molecule experiments 0, S @ ; 
the free energy difference Ai^ between two equilibrium 
states is related to the work W done on the system dur- 
ing non-equilibrium transformations P, Q , 



-I3AF 



(1) 



where (5 = \/k^T is the reciprocal temperature. The 
angular brackets (■ • • ) imply an average over many tra- 
jectories during which a control parameter is switched 
at a finite rate between values corresponding to the two 
equilibrium states. If the control parameter is switched 
slowly such that the system remains close to equilibrium 
at all times (in thermodynamics, this corresponds to a 
reversible transformation), the work done on the system 
differs little form the free energy difference, W k, AF. 
If, on the other hand, the control parameter is switched 
rapidly, the work values observed in different realizations 
of the switching process may vary over a large range 
and typically exceed the free energy difference. In other 
words, the work distribution P{W) is broad and peaked 
at work values larger than AF in this case. 

In fast switching computer simulations the exponen- 
tial average of Equ. ([T]) may create numerical problems 
particularly in the fast switching regime. The reason for 
these difficulties is best appreciated if one rewrites the 
Jarzynski equation ([T]) as integral over the work distri- 
bution P{W), 
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dW P{W)e 
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(2) 



For strong driving, the work distribution P(W) can have 
a very small overlap with the integrand P{W) exp{— PW) 
of the above equation. Accordingly, typical work val- 
ues from the peak of P{W) contribute little to the av- 
erage while the dominant work contributions to the av- 
erage are very rare Q. This results in large statistical 
errors of the free energy difference estimated from a fi- 
nite sample of non-equilibrium trajectories, an issue that 



has been addressed repeatedly in the recent literature 
S 0, i, i, [M El, 113,113, 113, [ll 13 m. One way to 

overcome this difficulty consists in favoring the sampling 
of trajectories with rare but important work values by in- 
troducing a work dependent bias function n(VF) p^.[l3j. 
The bias function guides the simulation, in which fast 
switching trajectories are harvested using transition path 
sampling methods [3, [l3| , toward the important regions 
of trajectory space. In this approach, Jarzynski's identity 
takes the form: 



-/3AF 



(e-^^/n)n 



(i/n>n 



(3) 



where the angular brackets (...)n denote averages over the 
biased path ensemble [13, [l3|- the present paper, we 
derive an expression for the bias function that minimizes 
the statistical error of the free energy estimate. When 
then test the bias function for a simple one-dimensional 
model and discuss implications for practical fast switch- 
ing simulations. 



II. OPTIMUM BIAS 

In a biased fast switching simulation with bias func- 
tion n(iy) the free energy difference AF is estimated 
according to Equ. ^ from a finite sample of N tra- 
jectories. Accordingly, the free energy estimate AFn is 
affected by a statistical error quantified by the fluctua- 
tions — {{AFn — AF)^), where the angular brackets 
denote an average over many realizations of the averaging 
process [13, 20, 21]. For large sample sizes N and statis- 
tically independent trajectories, the fluctuations are 
given by [l3i] 



where the unitless factor 



N 



(U) 
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n 



(4) 



(5) 



depends only on the work distribution P{W) and the 
bias function n(W^), but not on the sample size N. Also, 
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a^[n] equals the number NkT of trajectories required to 
obtain a free energy accuracy of kT. The argument of 
a^[n] emphasizes that the factor a^, which determines 
the size of the fluctuations for a given sample size N , is 
a functional of the bias function n(W^). 

We now determine the bias 11* that minimizes [H] 
and do so by requiring that upon an infinitesimal varia- 
tion (5n of the bias function 11* the variation of van- 
ishes, 



fo2[n*] = a2[n* + (5n] - a^p*] ^ o. 



(6) 



Expanding the right hand side of Equ. ([S]) and neglecting 
all terms of order ((511)^ and higher we obtain 



Ja^p*] = ((5n 



[n*] (e-/'(^-^^) - 1) 
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(7) 



Since i5a^[n*] has to vanish for any arbitrary variation 
(511, the expression in square brackets must be equal to 
zero. Solving for 11* then yields 



ir{w) = 



(n*)i/^ 



-/3(IV-AF) 
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(8) 



where the absolute value |...| is taken since the bias func- 
tion has to be non-negative. Although the fraction on 
the right hand side is a functional of the (at this point 
still unknown) optimum bias function H* , it does not ex- 
plicitly depend on the the work W and can therefore be 
treated as an irrelevant multiplicative constant. Thus, 
the optimum bias function can be written as 



n* 



~I3{W-AF) 



1 



(9) 



This equation is the main result of this paper. Remark- 
ably, the optimum bias function is very general and does 
not at all depend on the particular switching protocol 
used in the simulation. It does, however, depend on the 
unknown free energy difference Ai^ which limits the use- 
fulness of the optimum bias function in practice. For the 
optimum bias the fluctuations [H] take the particularly 
simple form 



(10) 



Thus, the statistical error in the free energy estimate 
can be calculated from a single integral over the work 
distribution. 

Due to the non-linearity of the logarithm, the expecta- 
tion value of the free energy difference estimated from a 
small sample does not coincide with the true free energy 
difference. The resulting bias, bn — {AFjy) ~ AF, is 
given by [13] 



-2/3(W-AF) 



1 



n 



(11) 



As noted earlier [8|, for certain bias functions and work 
distributions, the bias vanishes. In two situations. 



this is true also for simulations done with the optimum 
bias. Consider, first, a switching process for which the 
work distributions is identical to that of the reverse pro- 
cess. Then, the free energy difference AF — and it fol- 
lows from the Crooks theorem ^22,] that ((exp(— 2/31^) — 
1)/| exp(— /3iy) — 1|) =0 and hence the bias b^ also 
vanishes. The other instance in which b^ vanishes con- 
cerns processes with Gaussian work distributions, usually 
observed for slow switching (an exception are isolated 
systems in which adiabatic invariants prevent the work 
distribution from becoming Gaussian [l3|). In this case, 
the Jarzynski theorem implies that average work W and 
work variance cr^ are related by ct^ = 2{W — AF) / ^ [l|, 
and 6jv = can be demonstrated by direct evaluation of 
the integrals in Equ. (fTT|) . 



III. MODEL 

To illustrate the effect of the optimum bias we applied 
it to a one dimensional particle dragged through a viscous 
fluid by a harmonic trap of force constant k translated 
with constant speed v. In this case, the control parameter 
is the trap position, which changes by L during a time 
r = L/v. The particle, whose position is specified by the 
coordinate q, evolves according to the Langevin equation 
in the overdamped limit |23|], 



1 



(12) 



where the friction coefficient 7 is related to the delta- 
correlated Gaussian random noise -q by {ri{0)rj{t)) — 
2k'QT^~^5{t). For this model, the work distribution is 
Gaussian, with mean 



kL 



1) 



(13) 



and variance ct^ — 2k^TW 



24]. All following results 
were obtained for the parameters /? = l,7 = l,fc = l 
and L = 5. 



IV. RESULTS 

To visualize the effect of the optimum bias 11* (M^) we 
depict the work distribution P{W) for the particle in the 
harmonic trap together with P{W)e~^'^ , the integrand 
of Equ. (El), and P{W)I1*{W) in Fig. □ Up to a nor- 
malization factor, P{W)Il*{W) is identical to the work 
distribution Pq* (W) sampled in the biased ensemble, 

Pn^W) ^ P{W)n*{W)/ J dWP{W)U*{W). (14) 

For this model, Pn* (W) is symmetric around W — 0, 
as follows from the Crooks theorem [S^ for a process 
with identical work distributions in forward and back- 
ward direction. It is interesting to note that according 



3 



to the work distributions shown in Fig. ([T]), work values 
from the peaks of P{W) and P{W) exp{— PW) are sam- 
pled with the same frequency in the biased ensemble. As 
discussed in Sec. |Vl this property holds in general and 
carries ramifications for the design of bias functions. 




FIG. 1: Work distribution P{W) (solid line) along with the 
functions P{W)e-'^^ (dashed line) and P{W)W{W) (dash- 
dotted line) for the particle in the moving harmonic trap. 
These curves were obtained for the parameter set 13—1, 
7 = 1, fc = 1 and L = 5. 

We next calculate the statistical error of the free energy 
estimate as quantified by the factor a^, i.e., the number 
NkT of trajectories needed to obtain a free energy differ- 
ence accurate to fceT. To estimate the error for differ- 
ent bias functions, we do not carry out actual computer 
simulations of our model system. Rather, we calculate 
expected errors from the analytically known work distri- 
bution P{W) using Equ. ([5|). Since in our model the 
initial and final states correspond to two different posi- 
tion of the otherwise identical trap, the free energy dif- 
ference vanishes. The expression for the error, Equ. ([5]), 
therefore simplifies to: 

a'[U] = {U) (^^^^^y (15) 

Thus, for given bias function n(VF), the error can be 
easily calculated by integration. 

In addition to the optimum bias derived in this paper, 
we also examine the exponential bias 

ne(IF) ^e-''^/^ (16) 

suggested by Ytreberg and Zuckerman [l^l , as well as the 
inverse bias ni(l/F), which flattens the work distribution 
in the biased ensemble in the interval [Wmin, Wmax], 

r l/P(VF„,in) for VF<Ty,ni„, 
U,{W) = { 1/P{W) for W^in <W< W„,ax, 

[ l/P(W^max) for W > W^^ax. 

(17) 

To obtain a flat distribution in the important work range 
that includes the typical and dominant work values we 
choose VFmax,min = ±(/3o'^/2 + Auw) ■ The case without 
bias, i.e., n(PF) = 1, is also considered. 



For all bias functions discussed here the expected error 
can be calculated analytically from Equ. (|15p . For the 
optimum bias one obtains 

„.|n.l^4..,'(0). as, 

while for the exponential bias and the inverse bias the 
error is given by (l6j 

a" [n,] = 2eP"</^ (l - e-^"^"-/^) , (19) 

and 

a2[n,] - ^'"^^ - g^'"'" { 1 _ e-/3V?,./4) ^ (20) 

respectively. In calculating a^[ni] we have assumed 
that work values outside [Wmin, Wmax] do not signifi- 
cantly contribute to the integrals (note that in Ref. Q 
the boundaries Wmin and Wmax were selected incorrectly 
which lead to an erroneous a^pi] for small switching 
rates). Without any bias the expected error is 

a^P = 1] = e'^''^^ - 1. (21) 

The resulting values of are depicted in Fig. [5] as a 
function of the trap velocity v. The statistical error is 
indeed smallest for the optimum bias 11*. While the in- 
verse bias Hi also performs well over the whole range 
of trap velocities, large errors result for straightforward 
sampling (no bias) and the exponential bias lie at large 
trap velocities. For small trap velocities all bias func- 
tions lead to the same asymptotic behavior in which, as 
for straightforward fast switching without bias, the error 
depends linearly on the trap velocity as expected from 
linear response theory. 




FIG. 2: The squared fluctuations o? calculated for the particle 
in the harmonic trap as a function of the trap velocity v for 
the various bias functions described in the main text. These 
curves were obtained for the parameter set /3 = 1, 7 = 1, 
k = 1 and L — 5. 

To estimate the total computational cost of the sim- 
ulation we need to take into account that the cost of 
a single trajectory is approximately proportional to its 
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length (neglecting any overhead cost, for instance for the 
generation of initial conditions). Accordingly, we define 
the computational cost as the product of the number of 
trajectories N^t and their duration r in time [l3|, 

CcPv = NkTr^a^-. (22) 

V 

The computational effort Ccpu is the total CPU-time re- 
quired to obtain a free energy accurate to kT in units of 
the CPU time necessary to compute a trajectory of length 
1. As both N]^T and r depend on the switching rate (i.e., 
the trap velocity in our case), the benefit of short low- 
cost trajectories may be compensated by a large number 
of required trajectories or vice versa. The computational 
costs resulting from application of the various bias func- 
tions are shown in Fig. [3l For slow switching, the com- 
putational cost is constant and very similar for all bias 
functions implying that from an efficiency point of view it 
docs not matter if one generates a few long trajectories or 
more but shorter ones (as long as one stays in the linear 
regime) [25| . For larger switching rates, the computa- 
tional cost declines steadily with the switching rate pro- 
vided a good bias function is available. This result indi- 
cates that the instantaneous switching limit, in which the 
fast switc hing method turns into Zwanzig's perturbative 
approach l26|, yields the most efficient simulations. Note, 
however, that in this analysis we have neglected correla- 
tions, which possibly depend on the switching rate, and 
any computational overhead involved, for instance, in the 
generation of initial conditions. Taking such effects into 
account will abort the decline of Ccpu for large switch- 
ing rates and may well lead to a finite optimum switching 
rate, but huge savings in computer time are not expected 
in this case. 




V 



FIG. 3: Computational cost Ccpu calculated for the particle 
in the harmonic trap with parameters /3 = l,7=l,fc = l 
and L — 5. 

Since the optimum bias function depends on the un- 
known free energy difference (in fact, this is the quantity 
we wish to calculate), its application is not straightfor- 
ward. It is therefore interesting to examine the perfor- 
mance of bias functions of similar but more general form. 
For this purpose, we introduce the bias function 

fl{W) = \e-^>^^ - (l>\ + c, (23) 



which depends on the parameters a, 4> and c. In the case 
of the particle in the harmonic trap, this bias function is 
identical to the optimum bias function for a = 1, = 1 
and c — 0, while for a = 0.5, ^ = and c = it is 
equal to the exponential bias. The constant c is added 
to prevent a singularity in Equ. (fT5|) for (p > 0. (Note 
that for the optimum bias no singularity occurs even for 
c = 0.) Figure [5] shows a as a function of and a for 
the generalized bias II with a trap velocity of u = 1. The 
rather shallow minimum of a is indeed located at = 1 
and a = 1, the values corresponding to the optimum bias. 
Also for the exponential bias {(f> — 0, a = 0.5), the value 
of a is low, but it is a minimum only on the ^ = axis. 
Surprisingly, the a{(j), a)-surface has another minimum at 
w —1.55 and a « 1.1 of almost the same depth as that 
of the optimum bias function. This is a specific feature 
of the model and for other models the minimum of a will 
be located at different values of (j> and a. 




I 1 1 , , , , , 1 
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FIG. 4: Isolines of a{a, cp) as a function of the parameters a 
and (j) of the generalized bias Il(Vy) with c = 10~^ for the 
particle in the harmonic trap moving at velocity oi v = 1. 



V. CONCLUSION 

In this article we derived the optimum bias function 
that leads to the smallest errors in the free energy calcu- 
lated with work biased path sampling methods [l3| . For 
a one-dimensional model, we compared the accuracy ob- 
tained with various bias functions functions confirming 
that the optimum bias leads to the smallest statistical 
errors. Remarkably, the optimum bias function is of a 
very general form that is independent of the particular 
switching process (in contrast, the inverse bias 11^(1^) 
is strongly model dependent). The optimum bias does, 
however depend on the free energy difference, i.e., the 
very quantity one wants to calculate. This limits the 
practical applicability of the optimum bias. Neverthe- 
less, some general conclusions can be drawn from the 
form of the optimum bias function. 

According to Equs. ^ and (fT4|) . for the optimum bias 
function the work distribution in the biased ensemble is 
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given by 

Pn*{W)^\PB{-W)-P{W)\, (24) 

where we have used the Crooks theorem [l^l, Pr,{~W) — 
P{W)e-P^^-^^\ to relate the work distribution P{W) 
to that of the switching process carried out with time 
reversed switching protocol, Pfi{W). It has been noted 
before 0, H^j that Pfj(— W^) is peaked at the dominant 
work values, i.e, those work values that mostly contribute 
to the exponential work average of Equ. ([2]). Thus, it 
follows from Equ. ([M)) that both the dominant and the 
typical work values are sampled with the same frequency 
for the optimum bias. This is particularly apparent if 
the peaks of P{W) and Pr{—W) are far apart and the 
overlap between the two distributions is small. Then, 
Pti*{W) ~ Pr{—W) around the dominant work values 
and Pn* {W) ~ P{W) in the range of typical work values. 
In this case, a^p*] = (/ dW\PB{-W) - P{W)\f « 4, 
such that the number N^T of required trajectories con- 
verges to a constant value in the fast switching limit pro- 
vided the optimum bias is used. Equation (j24p also im- 
plies that the work values between the dominan-l — ht and 
typical ones are relatively unimportant. In particular, 
the work value W — AF does not need to be sampled at 
all. 

From this analysis of the optimum bias function one 
may infer that the most important property of good 
bias functions is that they lead to a biased ensemble in 
which both dominant and typical trajectories occur with 
high frequency. At first sight it seems easy to devise a 
procedure that does exactly that even without explic- 
itly using as bias function: just run an equal number of 
trajectories in forward and backward direction starting 
from the respective equilibrium initial conditions. Im- 
plicitly, this procedure corresponds to a bias function 
n(VF) = exp{-/3(H/ - AF)} -I- 1 and to the parame- 
ters a = 1, (/) = — 1, and c = of the generalized bias 
function II(VF). As can be seen in Fig. 31 a(a, 0) is very 
low but not a minimum at these parameter values since 
for slightly different parameters (a — 1.1, = —1.55) 
dominant and typical work values are sampled equally 
well, but work values in between are sampled less. Al- 



though it is easy to generate work values according to 
IV{W) — exp{— /3(W^ — AF)} -I- 1 using the procedure 
lined out above (without knowing the bias function it- 
self), problems appear somewhere else in this case. Ac- 
cording to Equ. ([3]) each contribution in the biased en- 
semble must be corrected by division through the bias 
function. Since the bias function itself depends on the 
unknown free energy difference A_F, however, the correc- 
tion cannot be carried out. 

The above analysis makes also clear why the inverse 
bias, ni(W^) = 1/P{W), works well: if all work val- 
ues in a sufficiently large range are sampled with the 
same frequency, the dominant and typical work val- 
ues occur with approximately the same weight as re- 
quired. The work values in between are sampled more 
than needed, but that does not strongly affect the error. 
For a Gaussian work distribution, the exponential bias 
ne(PF) = exp(— /3H^/2) leads to a preferred samphng of 
the work values between the typical and dominant ones. 
For slow driving both, typical and dominant work val- 
ues are included, but for strong driving neither ones are 
sampled with sufficient frequency leading to substantial 
statistical uncertainties. Also the generalized bias II(VF) 
of Equ. ((23)) leads to small errors if the resulting work 
distribution in the biased ensemble spans both the dom- 
inant and typical work values. The specific parameters 
4> and a at which that happens are, however, strongly 
model dependent such that no generally valid bias func- 
tion can be deduced. In conclusion, to reduce statistical 
errors bias functions for fast switching simulations must 
be designed such that both dominant and typical work 
values are sampled with high frequency. Devising such 
bias functions without knowledge of the free energy is 
challenging. 
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